[New Transformer] ChiMergeDiscretisor#459
[New Transformer] ChiMergeDiscretisor#459Morgan-Sell wants to merge 22 commits intofeature-engine:mainfrom
Conversation
|
Hi @Morgan-Sell Thanks for kicking this off. I reckon this one is not ready for review, right? It would be great to have some tests with the expected result for the transformer. Thank you! |
|
hola @solegalli, Correct, it's not ready for review. I'm still working on it. And, yes, I'll create tests. |
|
Should we avoid using dataframes and instead use dictionaries and numpy arrays? I suspect iterating through dataframes increases computational costs. I'm going to keep the question, but I think the answer is "yes". Dictionaries and numpy arrays simplify the merging of frequency distributions ;) |
…hod now returns a 2-d numpy array and 1-d numpy array instead of a dictionary.
…hod now returns a 2-d numpy array and 1-d numpy array instead of a dictionary.
… New method is incomplete. Issue with some of the chi-square calculations. It only happens w/ certain distributions
…the first 2 and last 2 chi-square values do not match expected results. meanwhile, the other 9 chi-square values match. unsure what is the cause of the discrepancy
|
hola @solegalli, I think I need an extra set of eyes. I'm struggling to identify what is causing the error for test_chi_merge(). I believe the root cause is in _calc_chi_square(); however, I cannot identify where. In test_chi_merge(), the expected results are The transformer returns the following results: The above values are the results from the chi-square tests of the consecutive distributions. 5 of the 12 expected results are incorrect. Indices of the values that don't reconcile: 0, 1, 6, 10, and 11. Do you see the bug? |
|
hola @solegalli, Did you have a chance to look at this bug? I'm stumped. |
|
hi @solegalli, Are you still getting around to reviewing this discretizer? I think it's super cool! I know you're quite busy. I'm trying to organize myself. |
|
Still pending. I send you an email? would that work? |
Closes #450.
Notes from #450:
Existing implementations:
https://github.com/lisette-espin/pychimerge
https://github.com/night18/ChiMerge
https://github.com/raiyan1102006/ChiMerge
https://gist.github.com/alanzchen/17d0c4a45d59b79052b1cd07f531689e?short_path=f2e54c6
Reference to the original article can be found in the first link